Genome editing using campylobacter jejuni crispr/cas system-derived rgen

ABSTRACT

The disclosure provided herewith relates to a Campylobacter jejuni CRISPR/CAS system-derived RGEN and a use thereof.

CROSS-REFERENCE TO RELATED APPLICATION(S)

This application is a Continuation Application of InternationalApplication No. PCT/KR2015/008269, filed Aug. 6, 2015, designating theUnited States of America, which claims the benefit of U.S. ProvisionalApplication No. 62/033,852, filed Aug. 6, 2014, which applications areincorporated herein by reference in their entirety.

REFERENCE TO SEQUENCE LISTING

The contents of the text file named “52029-501C01US_ST25.txt” which wascreated on Jan. 31, 2017, and is 25641 bytes, are hereby incorporated byreference in their entireties.

TECHNICAL FIELD

The disclosure provided herewith relates to a Campylobacter jejuniCRISPR/CAS system-derived, RNA-guided engineered nuclease (RGEN) andmethods for using the same.

BACKGROUND ART

Engineered nucleases can be used to effectively manipulate genes inliving cells or whole organisms by creating site-specific double-strandbreaks at desired locations in the genome (Nat Rev Genet, 2014. 15(5):p. 321-34.). Engineered nucleases, which comprise a DNA-binding domainand a nuclease domain customized for type II restriction enzymes, have abroad spectrum of genome engineering applications in the biotechnologyand medical fields as well as various other industries. More recently, amore potent RGEN platform has been developed based on the CRISPR/CAS9bacterial adaptive immune system.

The sequence that RGEN targets is limited to a protospacer adjacentmotif (PAM), which is a DNA sequence immediately following the DNAsequence targeted by the Cas9 nuclease. The PAM sequence was notpreviously reprogrammable in the CRISPR bacterial adaptive immunesystem. The canonical PAM comprises the sequence 5′-NGG-3′ and isassociated with the RGEN derived from the CAS9 nuclease of Streptococcuspyogenes. Hence, the GG motif is a prerequisite for DNA recognition bythe RGEN. To expand sequences for use as PAMs, attempts have been madeto separate RGENs from different bacterial species with versatile PAMs.In fact, different PAMs have been found to be associated with the CAS9protein of the bacteria Streptococcus thermophilus (PAM: NNAGAAW) andNeisseria meningitidis (PAM: NNNNGATT), widening the range of selectionin determining RGEN target loci.

SUMMARY

As described herein, intensive and thorough research into thedevelopment of RGENs from bacteria other than Streptococcus pyogenes hasresulted in the discovery that a Cas protein derived from Campylobacterjejuni (C. jejuni) specifically recognizes an NNNNRYAC sequence, whichcan be used as a PAM in the targeting of a DNA of interest. Further, aguide RNA can be engineered for optimal of a DNA, thereby resulting inefficient genome editing, transcription regulation, and separation of aDNA of interest.

Accordingly, in one aspect, the present invention provides a method fortargeting a DNA sequence comprising a PAM sequence of SEQ ID NO: 1, themethod comprising introducing a Cas protein that recognizes the PAMsequence of SEQ ID NO: 1, or a nucleic acid encoding the Cas proteininto a cell.

In another aspect, the present invention provides an isolated guide RNAcomprising a sequence capable of forming a duplex (forming a base pairor hybridizing) with a complementary strand of a target DNA sequence ofinterest adjacent to the PAM sequence of SEQ ID NO: 1, or a compositioncomprising the same.

In still another aspect, the disclosure provided herewith provides aCRISPR-CAS system, comprising: (i) a guide RNA comprising a sequencecapable of duplexing with a target NDA sequence adjacent to the PAMsequence of NNNNRYAC (SEQ ID NO: 1), or DNA encoding the guide RNA, and(ii) a Cas protein recognizing the NNNNRYAC sequence (SEQ ID NO: 1), ora nucleic acid encoding the Cas protein.

In still another aspect, the disclosure provided herewith provides arecombinant viral vector, comprising (i) an expression cassette for aguide RNA comprising a sequence capable of forming a duplex with atarget DNA sequence adjacent to the PAM sequence NNNNRYAC (SEQ ID NO:1), and (ii) an expression cassette for a Cas protein recognizing thePAM sequence NNNNRYAC (SEQ ID NO: 1).

In still another aspect, the disclosure provides an isolated guide RNAcomprising a sequence, 21-23 bp in length, capable of forming a duplexwith a complementary strand of a target DNA sequence, or a compositioncomprising the same.

In still another aspect, the disclosure provides an isolated guide RNA,comprising: a first region comprising a sequence capable of forming aduplex with a complementary strand of a target DNA sequence, and asecond region comprising a stem-loop structure characterized by a stem13-18 bp in length, or a composition comprising the isolated guide RNA.

In still another aspect, the disclosure provides an isolated guide RNA,comprising: a first region comprising a sequence capable of forming aduplex with a complementary strand of a target DNA sequence, and asecond region comprising a stem-loop structure characterized by a loop5-10 bp in length, or a composition comprising the isolated guide RNA.

In still another aspect, the disclosure provided herewith provides amethod of genome editing in a cell, comprising introducing an isolatedguide RNA or a DNA encoding the isolated guide RNA, along with a Casprotein or a nucleic acid encoding the Cas protein, into the cell.

In still another aspect, the disclosure provides a method of cleaving atarget DNA in a cell, comprising introducing an isolated guide RNA or aDNA encoding the isolated guide RNA, and a Cas protein or a nucleic acidencoding the Cas protein into the cell.

In still another aspect, the disclosure provides a method for preparinga target DNA-recognizing sequence of a guide RNA, comprising: (i)identifying the presence of a PAM sequence NNNNRYAC (SEQ ID NO: 1) in agiven sequence; and (ii) determining a sequence located upstream of thePAM sequence NNNNRYAC (SEQ ID NO: 1) as being recognizable by a guideRNA, if the presence of the PAM sequence is identified in step (i).

In still another aspect, the disclosure provides a method for isolatinga DNA of interest, comprising: (i) introducing a guide RNA or a DNAencoding the guide RNA, along with a deactivated Cas protein or anucleic acid encoding the deactivated Cas protein, into a cell, to allowthe guide RNA and the deactivated Cas protein to form a complex togetherwith the DNA of interest comprising a target DNA sequence; and (ii)separating the complex from a sample.

In still another aspect, the disclosure provides a method forCas-mediated gene expression regulation in a DNA of interest comprisinga target DNA sequence, comprising introducing an isolated guide RNAspecifically recognizing the target DNA sequence or a DNA encoding theguide RNA, and an deactivated Cas protein fused to a transcriptioneffector domain or a nucleic acid encoding the deactivated Cas proteininto a cell.

As described above, in some embodiments, the CRISPR/Cas system can beeffectively used for targeting a target DNA, thereby achieving genomeediting, transcription regulation, and isolation of a DNA of interest.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 depicts a schematic diagram of a C. jejuni Cas9 expressionvector. The vector is designed so that a humanized Cas9 protein isexpressed under the regulation of a CMV promoter and is provided with anuclear localization signal (NLS) and an HA tag at a C-terminal region.

FIG. 2A and FIG. 2B depict the experiments for C. jejuni RGEN-inducedmutation in an endogenous human AAVS1 target locus. FIG. 2A shows thatRGEN-driven chromosomal mutations were detected using a T7E1 assay.Asterisks (*) indicate DNA bands that are anticipated to be cleaved byT7E1. HEK293 wild-type (wt) gDNA was used as a negative control (−). Apreviously proven RGEN was used as a positive control (+). FIG. 2B showsDNA sequences of hAAVS1 mutant clones. Target sequence regionscomplementary to chimeric RNA are shown in bold. PAM sequencesrecognized by CAS9 are underlined. The WT sequence of FIG. 2B isrepresented by SEQ ID NO: 4, the (−2, x1) sequence by SEQ ID NO: 5, andthe (−1, x1) sequence by SEQ ID NO: 6.

FIG. 3A and FIG. 3B show the experiments for C. jejuni RGEN-inducedmutation in an endogenous mouse ROSA26 (mROSA) target locus. FIG. 3Ashows that RGEN-driven chromosomal mutations were detected using a T7E1assay. Asterisks (*) indicate DNA bands that are anticipated to becleaved by T7E1. NIH3T3 wt gDNA was used as a negative control (−). Apreviously proven RGEN was used as a positive control (+). FIG. 3B showsDNA sequences of mROSA mutant clones. Target sequence regionscomplementary to chimeric RNA are shown in bold. PAM sequencesrecognized by C. jejuni CAS9 are underlined. The WT sequence of FIG. 3Bis represented by SEQ ID NO: 7, the (−1, x1) sequence by SEQ ID NO: 8,and the (+1, x1) sequence by SEQ ID NO: 9.

FIG. 4 shows certain mutations induced in endogenous AAVS1 target lociby a mutant C. jejuni sgRNA structure. RGEN-driven chromosomal mutationswere detected using a T7E1 assay. Asterisks (*) indicate DNA bands thatare anticipated to be cleaved by T7E1. HEK293 wt gDNA was used as anegative control (−). A previously proven RGEN was used as a positivecontrol (+).

FIG. 5A to FIG. 5C illustrate the optimization of the spacer length ofsgRNAs. FIG. 5A shows various sgRNA structures. Additional nucleotidesimmediately upstream of the 5′ end of the spacer of sgRNA areunderlined, wherein small letters represent mismatched nucleotides withregard to the target sequence. The PAM sequence is boxed. In FIG. 5A,the target sequence is represented by SEQ ID NO: 10, GX19 by SEQ ID NO:11, GX20 by SEQ ID NO: 12, GX21 by SEQ ID NO: 13, GX22 by SEQ ID NO: 14,GX23 by SEQ ID NO: 15, GGX20 by SEQ ID NO: 16, and GGGX20 by SEQ ID NO:17. FIG. 5B shows target sites of sgRNA wherein sequences for hAAVS-CJ1,hAAVS-NRG1, hAAVS-NRG3, and hAAVS-NRGS are represented by SEQ ID NOs:18, 19, 20, and 21, respectively. FIG. 5C shows the efficiency of thesgRNA constructs in inducing RGEN-mediated mutations. Briefly, sgRNAswere constructed to have various lengths of spacers (19-23 bp) andvarious numbers of additional G (guanine) residues present immediatelyupstream of the spacer. Each of the sgRNAs shown in FIG. 5A was designedfor 4 target sites of the human AAVS1 locus (FIG. 5B), and was deliveredto human 293-cells. Subsequently, mutations induced by NHEJ wereidentified in the cells. In this embodiment, the target sites wereamplified by PCR, and analyzed by deep sequencing using miSEQ (Illumine)to detect the mutations. On the whole, genome editing (mutation)frequency was increased when the recognition sequence was 21-23 bp inlength or was provided with 2 or 3 additional G residues at the 5′ endthereof, compared to GX19 or GX20 used in C. jejuni or other species.

FIG. 6 is a graph showing the activity of C. jejuni CRISPR/Cas9 in whichthe AAVS1-CJ1 locus is inserted to a surrogate reporter. Relative to theactivity (100) detected for the ACAC sequence at the PAM site,activities were calculated when different nucleotides were substitutedat each position. At the first position, G as well as A guaranteed highactivity. T as well as C were effective at the second position. However,only A and C exhibited activity at the third and fourth positions,respectively. Therefore, NNNN-A/G-C/T-C-A (or NNNNRYAC, SEQ ID NO: 1,wherein A/G=R, C/T=Y) is inferred to be an optimal PAM sequence at leastin some embodiments.

FIG. 7 shows a consensus logo for a potential off-target sequence ofhAAVS1-CJ1 sgRNA, as excavated by the Digenome-Seq analysis.

FIG. 8 shows the test results for the PAM sequences of C. jejuni Cas9.Seven target sites of NNNNRYAC (SEQ ID NO: 1) were analyzed for mutationefficiency. hAAVS1-RYN1-7: ratio of mutation at each site insgRNA/Cas9-treated cells, WT1-7: ratio of mutation at each site ingenomic DNA of mock-treated cells.

FIG. 9 is a schematic diagram showing the structure of a C. jejuniCRISPR/Cas9 expression AAV vector.

FIG. 10 shows the genome editing, performed by C. jejuni CRISPR/Cas9 AAV(adeno-associated virus), in the Rosa26 locus. Briefly, C2C12 cells wereinfected with a recombinant AAV vector carrying both Rosa26-sgRNA and C.jejuni Cas9 at different MOI (multiplicity of infectivity). At 3, 5, 7,10, and 14 days post-infection, genomic DNA was isolated, and analyzedfor mutation ratio by deep sequencing.

DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

One embodiment of the present invention provides a method for targetinga DNA sequence of interest, comprising introducing into a cell a Casprotein or a nucleic acid encoding the same.

In detail, in accordance with one aspect, the present disclosureprovides a method for targeting a DNA sequence comprising a PAM(protospacer adjacent motif) sequence of SEQ ID NO: 1, comprisingintroducing a Cas protein that recognizes the PAM sequence NNNNRYAC ofSEQ ID NO: 1 or a nucleic acid encoding the Cas protein into a cell. InSEQ ID NO: 1, according to the IUPAC nomenclature, “N” refers to anynucleotide, for example, selected from A, C, G, and T; “R” refers topurine (AIG); and “Y” refers to pyrimidine (C/T).

In an aspect of the present disclosure, the method may further compriseintroducing a guide RNA comprising a sequence capable of forming aduplex with a complementary strand of a DNA of interest (target DNA)adjacent to the PAM sequence of SEQ ID NO: 1. The guide RNA can beintroduced simultaneously or sequentially with the Cas protein thatrecognizes the PAM sequence of SEQ ID NO: 1 or the nucleic acid encodingthe Cas protein. As used herein, the term “targeting” is intended toencompass the binding of a Cas protein to a DNA sequence of interest,either with or without DNA cleavage. The terminology that will bedescribed later is applicable to all embodiments of the presentdisclosure, and can be used in combination.

The Cas protein can perform its activity after forming a complex withCRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA). The Casprotein may exhibit endonuclease or nickase activity.

Information related to Cas proteins or genes encoding Cas proteins canbe found in well-known databases, such as GenBank of the NCBI (NationalCenter for Biotechnology Information). According to one embodiment, theCas protein may be a Cas9 protein. In another embodiment, the Casprotein may be one originating (derived) from a Campylobacter spp.(i.e., the genus Campylobacter) and may particularly be of Campylobacterjejuni in origin. More particularly, the Cas9 protein can be derivedfrom Campylobacter jejuni. In some embodiments of the presentdisclosure, the Cas protein may comprise the amino acid sequencerepresented by SEQ ID NO: 22, or may be homologous to the amino acidsequence of SEQ ID NO: 22, retaining the intrinsic activity thereof Forexample, without limitation, the Cas protein and its homologoussequences encompassed by the present disclosure may have a sequenceidentity at least 50%, 60%, 70%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%,88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identicalto the sequence of SEQ ID NO: 22.

Moreover, the Cas protein, as used in certain embodiments of the presentdisclosure, is intended to encompass any variant that can serve as anactivated endonuclease or nickase in cooperation with a guide RNA, aswell as a native protein. The activated endonuclease or nickase maycleave a target DNA, or may be able to perform genome editing with thecleavage function. As for deactivated variants, their functions may beused for regulating transcription or isolating a DNA of interest.

The Cas9 protein variant may be a derivative, variant, or mutant of Cas9resulting from the substitution of a catalytic aspartate or histidineresidue with a different amino acid. For example, the different aminoacid may be alanine, but is not limited thereto.

Specifically, the Cas protein, for example, a Cas9 protein derived fromC. jejuni may include a substitution of the catalytic aspartic acid (D)at position 8 or the histidine residue (H) at position 559 with an aminoacid that differs from the wild type amino acid sequence. In someembodiments, the catalytic aspartic acid (D) at position 8 or thehistidine residue (H) at position 559 of the sequence of SEQ ID NO. 22is substituted with a different amino acid. For example, the differentamino acid may be, without limitation, alanine. The Cas9 nucleasevariant prepared by introducing a mutation to one active site of thenative Cas9 nuclease can act as a nickase in association with a guideRNA. When bound to one guide RNA molecule, two nickase molecules cancleave both strands of a DNA duplex of interest, thereby creatingdouble-strand breaks (DSB). Hence, such variants also belong to thescope of RGENs encompassed by the present disclosure.

As used herein, the term “deactivated Cas protein” refers to a Casnuclease, the function of which is entirely or partially deactivated.The deactivated Cas protein may be abbreviated to dCas. The Cas may be aCas9 protein. Further, it may originate from Campylobacter spp., andparticularly from C. jejuni. Any method may be used in the preparationof the deactivated Cas9 nuclease, so long as it eliminates nucleaseactivity. For example, a dCAS9 protein can be constructed by introducingmutations into the two above-mentioned active loci of the Cas9 nuclease.The dCAS9 can then act as a DNA-bound complex with a guide DNA, whilelacking a DNA cleavage function. Moreover, the dCAS9 protein may havesubstituents having other than the aspartic acid (D) at position 8 andthe histidine (H) at position 559. For example, in some embodiments, thedCAS9 protein may have substituents other than the aspartic acid (D) atposition 8 and the histidine (H) at position 559 of the sequence of SEQID NO. 22. The substituents may be, without limitation, alanine. As usedherein, the term “cleavage” refers to the breakage of the covalentbackbone of a nucleotide molecule.

In some embodiments of the present disclosure, the Cas protein can be arecombinant protein. The term “recombinant”, as used in conjunctionwith, for example, a cell, nucleic acid, protein, or vector, refers tothe cell, nucleic acid, protein or vector that is modified by theintroduction of a heterologous nucleic acid or protein or by thealteration of a native nucleic acid or protein, or which is derived fromsuch a modified cell. Thus, for example, a recombinant Cas protein maybe generated by reconstituting a Cas protein-encoding nucleic acidsequence (i.e., a sequence encoding a Cas protein), based on the humancodon table.

In some embodiments of the present disclosure, the Cas protein or thenucleic acid encoding the same may be in a form that is allowed to beactive within the nucleus.

In some embodiments of the present disclosure, the isolated Cas proteinmay be in a form that is easy to introduce into cells. For example, theCas protein may be linked to a cell-penetrating peptide or a proteintransduction domain. The protein transduction domain may be, withoutlimitation, poly-arginine or an HIV-derived TAT protein. The presentdisclosure encompasses various examples of cell-penetrating peptides orprotein transduction domains that are well known in the art.

In some embodiments of the present disclosure, the Cas protein or thenucleic acid encoding the same may further comprise a nuclearlocalization signal (NSL) for transporting the protein or nucleic acidinto a nucleus in a cell by nuclear transport. In addition, the nucleicacid encoding the Cas protein may further comprise a nuclearlocalization signal (NLS) sequence. Thus, the Cas protein-encodingnucleic acid may be present as a component of an expression cassettethat may contain, but is not limited to, an NLS sequence as well as aregulatory element, such as a promoter.

In some embodiments of the present disclosure, the Cas protein may belinked with a tag that facilitates separation and/or purification. As anon-limiting example, a small peptide tag, such as a His tag, a Hag tag,an S tag, etc., a glutathione S-transferase (GST) tag, or a maltosebinding protein (MBP) tag may be used, depending on the purpose.

In some embodiments of the present disclosure, where the Cas protein isassociated with target DNA-specific guide RNA, the Cas protein may becollectively termed an RGEN (RNA-Guided Engineered Nuclease). As usedherein, the term “RGEN” refers to a nuclease having a targetDNA-specific guide RNA and a Cas protein.

For application to cells, according to some embodiments of the presentdisclosure, the RGEN may have a target DNA-specific guide RNA or a DNAencoding the guide RNA; as well as an isolated Cas protein or a nucleicacid encoding the Cas protein. In this regard, the guide RNA or the DNAencoding the guide RNA may be applied to cells simultaneously orsequentially with the Cas protein or the nucleic acid encoding the Casprotein.

In an aspect of the present disclosure, the RGEN for delivery to cellsinclude 1) a target DNA-specific guide RNA and an isolated Cas protein,or 2) a DNA encoding the guide RNA or a nucleic acid encoding the Casprotein. Delivery in the form of 1) is designated “RNP delivery.”Examples of the isolated guide RNA may comprise, but are not limited to,in vitro transcribed RNAs.

In some embodiments of the present disclosure, the guide RNA-coding DNA(DNA encoding the guide RNA) and the Cas protein-coding nucleic acid maythemselves be used as isolated nucleic acids. Alternatively, withoutlimitation, they may be present in a vector having an expressioncassette for expressing the guide RNA and/or the Cas protein. Examplesof suitable vectors include a viral vector, a plasmid vector, and anagrobacterium vector. The viral vector may be exemplified by, but is notlimited to, an AAV (adeno-associated virus).

In some embodiments of the present disclosure, without limitation, theguide RNA-coding DNA and the Cas protein-coding nucleic acid may bepresent separately in respective vectors or together in a single vector.

The foregoing application embodiments of the subject matter can beapplied to more exemplary embodiments as described in thisspecification. In addition, application embodiments that will bedescribed later may be applied in combination with other constitutionalelements.

As used herein, the term “guide RNA” may refer to a RNA havingspecificity to a target DNA (i.e., a target DNA-specific RNA), which canbe coupled with a Cas protein to guide the Cas protein to the targetDNA.

Moreover, at least in some embodiments, the guide RNA may be designed tobe specific for a certain target to be cleaved.

In some embodiments of the present disclosure, the guide RNA may be adual RNA consisting of two RNAs, that is, a crRNA and a tracrRNA. Inother embodiments, the guide RNA may be a sgRNA comprising or consistingof a first region containing a sequence complementary to the target DNAcapable of forming a duplex with a complementary strand of the targetDNA, and a second region containing a sequence responsible forinteracting with the Cas protein. More particularly, the guide RNA maybe a sgRNA (single guide RNA or single-stranded guide RNA) synthesizedby fusing respective essential portions of a crRNA and a tracrRNA.

In some embodiments of the present disclosure, the sequence capable offorming a duplex with a complementary strand of a target DNA sequence inthe guide RNA may range, without limitation, in length from 17 to 23 bp,from 18 to 23 bp, from 19 to 23 bp, particularly from 20 to 23 bp, andmore particularly from 21 to 23 bp. The length may be applied to boththe dual RNA and the sgRNA, and more particularly to the sgRNA.

In some embodiments of the present disclosure, the guide RNA maycomprise one to three, more particularly two or three additionalnucleotides just prior to the 5′ end of the sequence capable of forminga duplex with a complementary strand of a target DNA sequence. Thenucleotides are selected from among A, T, G, C, and combinations thereofThe guide RNA may comprise, as additional nucleotides, one to threeconsecutive guanine (G) residues, more preferably, two or threeconsecutive G residues. This is applied, without limitation, to bothdualRNAs and sgRNAs, and more preferably to sgRNAs.

In some embodiments of the present disclosure, the sgRNA may comprise aregion complementary to a target DNA sequence (termed “Spacer region”,“Target DNA recognition sequence”, “base pairing region”, etc.), and ahairpin structure for binding to the Cas protein.

In some embodiments of the present disclosure, the sgRNA may comprise aregion complementary to a target DNA sequence, a hairpin structure forbinding to the Cas protein, and a terminator sequence. These elementsmay be, without limitation, sequentially arranged in the 5′ to 3′direction.

In some embodiments of the present disclosure, any form of guide RNA canbe used, as long as it contains respective essential portions of a crRNAand a tracrRNA and a region complementary to a target DNA.

In some embodiments of the present disclosure, the crRNA may hybridizewith a target DNA.

In some embodiments of the present disclosure, the RGEN may consist of aCas protein and a dualRNA, or a Cas protein and an sgRNA. Alternatively,the RGEN may comprise respective nucleic acids encoding a Cas proteinand an sgRNA as constitutional elements, but is not limited thereto.

In some embodiments of the present disclosure, the guide RNA, e.g.,crRNA or sgRNA, may contain a sequence complementary to a target DNAsequence, and may comprise one or more additional nucleotides locatedupstream of the crRNA or sgRNA, particularly at the 5′ end of the crRNAof sgRNA or dualRNA. The additional nucleotides may be, but are notlimited to, guanine (G) residues.

In some embodiments of the present disclosure, the guide RNA maycomprise a sequence capable of forming a duplex with (i.e., forming abase pair with or hybridizing to) a complementary strand of a target DNAsequence adjacent to the PAM (proto-spacer-adjacent motif) sequenceNNNNRYAC (SEQ ID NO: 1).

In some embodiments of the present disclosure, the guide RNA maycomprise a first region, capable of forming a duplex with acomplementary stand of a target DNA sequence, and a second region,comprising a stem-loop structure characterized by a stem 13-18 bp inlength. In certain embodiments, the stem may comprise the nucleotidesequence of SEQ ID NO: 2 (5′-GUUUUAGUCCCUUGUG-3′) and a complementarysequence thereof.

In some embodiments of the present disclosure, the guide RNA maycomprise a first region, capable of forming a duplex with acomplementary stand of a target DNA sequence, and a second regioncomprising a stem-loop structure characterized by a loop 5-10 bp inlength. The loop may comprise the nucleotide sequence of SEQ ID NO: 3(5′-AUAUUCAA-3′).

In some embodiments of the present disclosure, the Cas proteins and theguide RNAs, especially sgRNAs, which are described above or later, maybe those that are not naturally occurring or are engineered. Inaddition, the factors described for each matter may be combined togetherfor application.

In some embodiments of the present disclosure, the intracellularintroduction of RGEN can be achieved by, but is not limited to, (1)delivering the Cas9 protein, purified after bacterial overexpression,and the sgRNA (single guided RNA), that recognizes a specific HLA targetsequence, which is prepared after in vitro transcription in cells, or(2) delivering a plasmid carrying the Cas9 gene and the sgRNA into cellsfor expression or transcription.

In addition, proteins, RNAs or plasmid DNAs encompassed within the scopeof the present disclosure can be introduced into cells through variousmethods known in the art, such as, without limitation, electroporation,or techniques using liposomes, viral vectors, nanoparticles, or PTD(protein translocation domain) fusion proteins.

In some embodiments, a method of the present disclosure may be used tocleave a target DNA comprising the PAM sequence of SEQ ID NO: 1, andmore particularly to edit a genome. In this context, the Cas protein maybe in an active form with a nuclease or nickase activity.

In certain embodiments, the Cas protein may be in a deactivated(inactivated) form. In this case, the method of the present disclosureis conducted in such a way that a target DNA sequence comprising the PAMsequence of SEQ ID NO: 1 is not cleaved, but is associated with the Casprotein.

Moreover, in some other embodiments, the Cas protein, more particularly,the deactivated Cas protein, may further comprise a transcriptioneffector domain. In detail, the deactivated Cas protein may be linkedto, without limitation, an activator, a repressor, or so on.

Given the transcription effector domain, the method, at least in someembodiments, may be applied to Cas-mediated gene expression regulationcomprising transcriptional regulation or epigenetic regulation.

In accordance with another aspect, the present disclosure provides anisolated guide RNA comprising a sequence capable of forming a duplexwith a complementary strand of a target DNA sequence adjacent to the PAM(proto-spacer-adjacent motif) NNNNRYAC (SEQ ID NO: 1). The isolatedguide RNA may be one that is not naturally occurring or is artificiallyengineered. The individual elements are as described above.

In some embodiments of the present disclosure, the guide RNA may besingle guide RNA in which the sequence capable of forming a duplex witha complementary strand of a target DNA may range in length from 17 to 23bp, from 18 to 23 bp, from 19 to 23 bp, particularly from 20 to 23 bp,and more particularly from 21 to 23 bp, without being limited thereto.

Further, the guide RNA, at least in some embodiments, may comprise oneto three consecutive guanine (G) residues just upstream of the 5′ end ofthe complementary strand of the target DNA, but is not limited thereto.Additionally, the foregoing description of the additional nucleotidescan also be applicable to this embodiment.

Also, provided in accordance with a another aspect of the presentdisclosure is a composition comprising a guide RNA comprising a sequencecapable of forming a duplex with a complementary strand of a target DNAsequence adjacent to the PAM (proto-spacer-adjacent motif) sequenceNNNNRYAC (SEQ ID NO: 1), or a DNA encoding the guide RNA. Each of thecomponents, in at least some embodiments, is as described above.

In some embodiments of the present disclosure, the composition mayfurther comprise a Cas protein recognizing the sequence NNNNRYAC (SEQ IDNO: 1) or a nucleic acid encoding the Cas protein.

In addition, in certain embodiments, the composition may be used forgenome editing.

Further, in some embodiments, the composition may comprise: (i) a guideRNA comprising a sequence capable of forming a duplex with acomplementary strand of a target DNA sequence adjacent to the PAM(proto-spacer-adjacent motif) NNNNRYAC (SEQ ID NO: 1), or a DNA encodingthe guide RNA; and (ii) an deactivated Cas protein (dCas) or a nucleicacid encoding the dCas.

In an embodiment, the deactivated Cas protein may further comprise atranscription effector domain.

In some embodiments of the present disclosure, the composition may beused to isolate a DNA of interest comprising a target DNA sequence. Inthis regard, the deactivated Cas protein may be labeled with a taguseful for separation and purification, but is not limited thereto. Thetag may be as described above.

In some embodiments of the present disclosure, the composition may beused for Cas-mediated gene expression regulation, comprisingtranscriptional regulation or epigenetic regulation.

In some embodiments of the present disclosure, the target DNA may bepresent in isolated cells, for example, eukaryotic cells. Examples ofthe eukaryotic cells include yeasts, fungi, protozoa, cells from plants,higher plants, insects or amphibians, and mammalian cells such as CHO,HeLa, HEK293, and COS-1 cells. Without limitation, cultured cells (invitro), graft cells, primary cell culture (in vitro and ex vivo), invivo cells, and mammalian cells including human cells are commonly usedin the art.

In accordance with a still further aspect, the present disclosureprovides a CRISPR-CAS system, comprising (i) a guide RNA comprising asequence capable of forming a duplex with a target DNA sequence adjacentto the PAM (proto-spacer-adjacent motif) NNNNRYAC (SEQ ID NO: 1), or aDNA encoding the guide RNA; and (ii) a Cas protein recognizing the PAMsequence NNNNRYAC (SEQ ID NO: 1) or a nucleic acid encoding the Casprotein. The individual factors are as described above. These factorsmay be non-naturally occurring or engineered.

Still another aspect of the present disclosure pertains to a recombinantviral vector, comprising (i) an expression cassette for a guide RNAcomprising a sequence capable of forming a duplex with a target DNAsequence adjacent to the PAM (proto-spacer-adjacent Motif) sequence ofNNNNRYAC (SEQ ID NO: 1), and (ii) an expression cassette for a Casprotein recognizing the PAM sequence of NNNNRYAC (SEQ ID NO: 1). Theindividual factors are as described above. These factors may benon-naturally occurring or engineered. The viral vector, at least insome embodiments, may be of AAV (Adeno-associated virus) origin.

Yet another aspect of the present disclosure pertains to an isolatedguide RNA comprising a sequence of 21-23 bp in length, capable offorming a duplex with a complementary strand of a target DNA sequence.The guide RNA is as defined above. The guide RNA may be non-naturallyoccurring or engineered.

Yet still another aspect of the present disclosure pertains to acomposition comprising the guide RNA or a DNA encoding the guide RNA.The individual factors are as described above. These factors may benon-naturally occurring or engineered.

The composition, at least in some embodiments, may comprise a Casprotein that recognizes the PAM sequence NNNNRYAC (SEQ ID NO: 1), or anucleic acid that encodes the Cas protein.

In addition, the composition, in some embodiments, may comprise adeactivated Cas recognizing the NNNNRYAC sequence (SEQ ID NO: 1), or anucleic acid encoding the deactivated Cas protein. The deactivated Casprotein, in some embodiments, may further comprise a transcriptioneffector domain.

According to an additional aspect, the present disclosure provides anisolated guide RNA, comprising a first region comprising a sequencecapable of forming a duplex with a complementary strand of a target DNAsequence, and a second region comprising a stem-loop structurecharacterized by a stem 13-18 bp in length. The individual factors areas defined above. These factors may be non-naturally occurring orengineered.

In certain embodiments, the stem may comprise the nucleotide sequence ofSEQ ID NO: 2 (5′-GUUUUAGUCCCUUGUG-3′) and a complementary sequencethereof.

According to a further additional aspect, the present disclosureprovides an isolated guide RNA, comprising a first region comprising asequence capable of forming a duplex with a complementary strand of atarget DNA sequence, and a second region comprising a stem-loopstructure characterized by a loop 5-10 bp in length. The individualfactors are as defined above. These factors may be non-naturallyoccurring or engineered.

In certain embodiments, the loop may comprise the nucleotide sequence ofSEQ ID NO: 3 (5′-AUAUUCAA-3′).

According to yet an additional aspect, the present disclosure provides acomposition comprising a guide RNA, along with a Cas protein or anucleic acid encoding the Cas protein. The individual factors are asdefined above. These factors may be non-naturally occurring orengineered.

Yet still another aspect of the present disclosure provides a method forgenome editing in a cell, comprising introducing into the cell anisolated guide RNA or a DNA encoding the isolated guide RNA, togetherwith a Cas protein or a nucleic acid encoding the Cas protein. Theindividual factors are as defined above. These factors may benon-naturally occurring or engineered.

Yet a further aspect of the present disclosure provides a method forcleaving a target DNA in a cell, comprising introducing into the cell anisolated guide RNA or a DNA encoding the isolated guide RNA, along witha Cas protein or a nucleic acid encoding the Cas protein. The individualfactors are as defined above. These factors may be non-naturallyoccurring or engineered.

In certain embodiments, the guide RNA or the DNA encoding the guide RNAmay be introduced into a cell simultaneously or sequentially with theCas protein or the nucleic acid encoding the Cas protein.

A still further aspect of the present disclosure provides a method forpreparing a target DNA-recognizing sequence of a guide RNA (i.e., asequence in a guide RNA that is responsible for recognizing a targetDNA), comprising: (i) identifying the presence of a PAM sequenceNNNNRYAC (SEQ ID NO: 1) in a given sequence; and (ii) determining asequence located just upstream of the PAM sequence NNNNRYAC (SEQ IDNO: 1) as being recognizable by a guide RNA, if the presence of the PAMsequence is identified in step (i). The individual factors are asdefined above. These factors may be non-naturally occurring orengineered.

In some embodiments of the disclosure, the sequence located upstream ofthe PAM sequence may range, without limitation, in length from 17 to 23bp, from 18 to 23 bp, from 19 to 23 bp, more particularly from 20 to 23bp, and even more particularly from 21 to 23 bp.

Yet another aspect of the present disclosure provides a method forisolating a DNA of interest, comprising: (i) introducing into a cell aguide RNA or a DNA encoding the guide RNA, along with a deactivated Casprotein or a nucleic acid encoding the deactivated Cas protein, therebypermitting the guide RNA and the deactivated Cas protein to form acomplex together with the DNA of interest comprising a target DNAsequence; and (ii) separating the complex from a sample. The individualfactors are as defined above. These factors may be non-naturallyoccurring or engineered. The deactivated Cas protein, at least in someembodiments, may recognize the PAM (protospacer-adjacent Motif) sequenceNNNNRYAC (SEQ ID NO: 1).

In certain embodiments, the method for isolating a DNA of interest maybe performed by allowing a guide RNA (gRNA), binding specifically to theDNA of interest, and a deactivated Cas protein (dCas) to form adCas-gRNA-DNA of interest complex with the DNA of interest; andseparating the complex from a sample. The DNA of interest, in someembodiments, may be identified using a well-known detection method, suchas PCR amplification, etc. The isolation method, in some embodiments,may be adapted for cell-free DNA in vitro without forming crosslinks viacovalent bonds between the DNA, the gRNA, and the dCas. In addition, theisolation method may further comprise isolating the DNA of interest fromthe complex in some embodiments.

The deactivated Cas protein, in some embodiments, may be linked with anaffinity tag for use in isolating the DNA of interest. The affinity tagmay be selected from the group consisting of a His tag, a Flag tag, an Stag, a GST (Glutathione S-transferase) tag, an MBP (Maltose bindingprotein) tag, a CBP (chitin binding protein) tag, an Avi tag, acalmodulin tag, a polyglutamate tag, an E tag, an HA tag, a myc tag, anSBP tag, softag 1, softag 3, a strep tag, a TC tag, an Xpress tag, aBCCP (biotin carboxyl carrier protein) tag, and a GFP (green fluorescentprotein) tag, but are not limited thereto. The deactivated Cas protein,in some embodiments, may be a Cas protein that lacks DNA cleavageactivity.

Isolation of a DNA of interest, in some embodiments, may be achievedusing an affinity column or magnetic beads capable of binding the tagused. For example, when a His tag is used to isolate the DNA ofinterest, a metal affinity column or magnetic beads capable of bindingthe His tag may be employed. The magnetic beads may comprise, but arenot limited to, Ni-NTA magnetic beads.

In some embodiments, isolation of a DNA of interest from the complex maybe conducted using RNase and protease.

In some embodiments in the method for isolating a DNA of interest, acertain genotype DNA, or two or more different DNAs of interest can beisolated from an isolated sample containing a mixture of two or moredifferent genotype DNAs. When the method involves isolating two or moredifferent DNAs of interest, guide RNAs respectively specific for the twoor more different DNAs of interest may be employed to isolate two ormore DNAs of interest.

In certain embodiments, the guide RNA may be single guide RNA (sgRNA),or dualRNA comprising crRNA and tracrRNA. The guide RNA may be anisolated RNA, or may be encoded in a plasmid.

The isolation method, in certain embodiments, may be performed bybinding a guide RNA (gRNA) specifically to 1) a DNA of interest and 2) adeactivated Cas protein (dCas) to form a dCas-gRNA-DNA complex with theDNA of interest; and separating the complex from the sample.

Yet an additional aspect of the present disclosure provides a method forCas-mediated gene expression regulation in a DNA of interest comprisinga target DNA sequence, the method comprising introducing an isolatedguide RNA, specifically recognizing the target DNA or a DNA encoding theguide RNA, along with a deactivated Cas protein fused to a transcriptioneffector domain or a nucleic acid encoding the deactivated Cas protein,into a cell. The individual factors are as defined above. These factorsmay be non-naturally occurring or engineered.

EXAMPLES

The following examples are provided for the purpose of illustrating someaspects of the disclosure provided herewith and they should not beconstrued as limiting the scope of the present disclosure in any manner.

C. jejuni CRISPR/Cas9 System Example 1: Genome Editing using C. jejuniCRISPR/Cas9

The present inventors succeeded in isolating RGEN from C. jejuni. Toidentify the characteristics of the C. jejuni CRISPR/CAS 9-derived RGENwith regard to genome editing, a C. jejuni CAS9 gene optimized for humancodons was synthesized (TABLE 1) and then inserted into a mammalianexpression vector to construct a C. jejuni CAS9 expression cassette inwhich the HA-tagged, NLS-linked Cas gene was under the regulation of aCMV promoter (FIG. 1).

TABLE 1 Amino Acid Sequence of C. jejuni Cas9 Protein SEQ IDAmino acid sequence size NO MARILAFDIGISSIGWAFSENDELKDCGVRIFTKVE 1003a.a22 NPKTGESLALPRRLARSARKRLARRKARLNHLKHLIANEFKLNYEDYQSFDESLAKAYKGSLISPYELRFRALNELLSKQDFARVILHIAKRRGYDDIKNSDDKEKGAILKAIKQNEEKLANYQSVGEYLYKEYFQKFKENSKEFTNVRNKKESYERCIAQSFLKDELKLIFKKQREFGFSFSKKFEEEVLSVAFYKRALKDFSHLVGNCSFFTDEKRAPKNSPLAFMFVALTRIINLLNNLKNTEGILYTKDDLNALLNEVLKNGTLTYKQTKKLLGLSDDYEFKGEKGTYFIEFKKYKEFIKALGEHNLSQDDLNEIAKDITLIKDEIKLKKALAKYDLNQNQIDSLSKLEFKDHLNISFKALKLVTPLMLEGKKYDEACNELNLKVAINEDKKDFLPAFNETYYKDEVTNPVVLRAIKEYRKVLNALLKKYGKVHKINIELAREVGKNHSQRAKIEKEQNENYKAKKDAELECEKLGLKINSKNILKLRLFKEQKEFCAYSGEKIKISDLQDEKMLEIDHIYPYSRSFDDSYMNKVLVFTKQNQEKLNQTPFEAFGNDSAKWQKIEVLAKNLPTKKQKRILDKNYKDKEQKNFKDRNLNDTRYIARLVLNYTKDYLDFLPLSDDENTKLNDTQKGSKVHVEAKSGMLTSALRHTWGFSAKDRNNHLHHAIDAVIIAYANNSIVKAFSDFKKEQESNSAELYAKKISELDYKNKRKFFEPFSGFRQKVLDKIDEIFVSKPERKKPSGALHEETFRKEEEFYQSYGGKEGVLKALELGKIRKVNGKIVKNGDMFRVDIFKHKKTNKFYAVPIYTMDFALKVLPNKAVARSKKGEIKDWILMDENYEFCFSLYKDSLILIQTKDMQEPEFVYYNAFTSSTVSLIVSKHDNKFETLSKNQKILFKNANEKEVIAKSIGIQNLKVFEKYIVSALGEVT KAEFRQREDFKKSGPPKKKRKVYPYDVPDYA-

The native guide RNA of the C. jejuni CRISPR/CAS9 system consists oftracrRNA and target-specific crRNA. In view of the notion that the guideRNA is used as the two RNA molecules in themselves or as a single guideRNA (sgRNA) in which crRNA and tracrRNA are fused to each other, thepresent inventors designed and constructed an expression plasmid for C.jejuni sgRNA (TABLE 2).

TABLE 2 SEQ ID sgRNAs sgRNA sequence NO C. jejuni_ NNNNNNNNNNNNNNNNNNNNGTTTTAGTCCCT 23 sgRNA GAAA AGGGACTAAAAT AAAGAGTTTGCGGGACTCTGCGGGGTTACAATCCCCTAAAACCGCTTT TTTT

Then, potential target loci for human AAVS1 and mouse Rosa-26 wereselected based on the PAM sequence (NNNACA) of the C. jejuni CRISPR/CAS9system (TABLE 3).

TABLE 3 SEQ ID sgRNAs Target Sequence NO Human AAVS1_ATATAAGGTGGTCCCAGCTCGGGGACA 24 C. Jejuni Mouse Rosa26_ATTCCCCTGCAGGACAACGCCCACACA 25 C. Jejuni

To examine whether the C. jejuni RGEN can be used for the targeteddisruption of endogenous genes in mammalian cells, genomic DNA, isolatedfrom transfected cells using T7 endonuclease I (T7E1), amismatch-sensitive endonuclease that specifically recognizes and cleavesheteroduplexes formed by the hybridization of wild-type and mutant DNAsequences, was analyzed. The primer sequences used are as follows (TABLE4).

TABLE 4 SEQ ID Primer Sequence NO Human AAVS1-F TGCTTCTCCTCTTGGGAAGT 26Human AAVS1-R CCCCGTTCTCCTGTGGATTC 27 Mouse Rosa26-FACGTTTCCGACTTGAGTTGC 28 Mouse Rosa26-R CCCAGCTACAGCCTCGATTT 29

As a result, mutations (interchangeably substitution or variation) weredetected only in the cells into which the CAS9 protein and the guide RNAwere introduced together. The mutation frequency was found to beRNA-dose dependent, as measured based on relative DNA band intensities(FIG. 2A). In addition, DNA sequencing analysis of the PCR ampliconscorroborated the induction of RGEN-mediated mutations at the endogenoussites. Indels and microhomologies, which are characteristic oferror-prone nonhomologous end joining (NHEJ) repair, were observed atthe target sites (FIG. 2B). The mutation frequency was 16.7% as measuredby direct sequencing (=2 mutant clones/12 clones).

Likewise, when mouse Rosa26 C. jejuni RGEN was delivered into mouseNHI3T3 cells, mutations were effectively induced at the mouse Rosa26site, as measured by a T7E1 assay (FIG. 3A). In addition, DNA sequencinganalysis of the PCR amplicons revealed the induction of C. jejuniRGEN-mediated mutation at the endogenous gene sites (FIG. 3B). Themutation frequency was found to be 22.2% as measured by directsequencing (2 mutant clones/9 clones).

Example 2: Structural Modification of sgRNA

With the anticipation that the C. jejuni crRNA: tracrRNA complex wouldcomprise a shorter loop structure than those from other bacterialspecies, a modified stem or loop structure was designed to structurallystabilize the C. jejuni RGEN sgRNA constructed in Example 1 (TABLE 5).

TABLE 5 SEQ ID sgRNAs sgRNA Sequence NO C. jejuni_ NNNNNNNNNNNNNNNNNNNNGTTTTAGTCCC 23 sgRNA T GAAA AGGGACTAAAAT AAAGAGTTTGCGGGACTCTGCGGGGTTACAATCCCCTAAAACCGC TTTTTTT C. jejuni_ NNNNNNNNNNNNNNNNNNNNGTTTTAGTCCC 30 sgRNA_stem T TGTGGAAATATA AGGGACTAAAAT AAAGAG modifiedTTTGCGGGACTCTGCGGGGTTACAATCCCCT AAAACCGCTTTTTTT C. jejuni_NNNNNNNNNNNNNNNNNNNN GTTTTAGTCCC 31 sgRNA_loop T ATATTCAA AGGGACTAAAATAAAGAGTTTG modified CGGGACTCTGCGGGGTTACAATCCCCTAAAA CCGCTTTTTTT

In TABLE 5, norm stem parts are shown in bold and underlined.

When the modified sgRNA was introduced to target the target site of thehuman AAVS1 C. jejuni RGEN into which mutations were successfullyinduced through the normal sgRNA structure, similar mutation frequencieswere observed (FIG. 4). In this regard, the primer sequences used are asshown in TABLE 4.

Example 3: Optimization of Length of sgRNA Spacer

The spacer sequence of C. jejuni crRNA, recognizing a target sequence,was reported to be 20 bp in length in the literature. To determine whichspacer length is optimal, a genome editing test was performed for 4target sites of Cj Cas9 on human AAVS1 loci, as shown in TABLE 6, usingspacers with various lengths, and sgRNA mutant structures withadditional nucleotides at 5′ terminus (FIGS. 5A to 5C). For the methodused in this experiment, reference was made to Genome Res. 2014 January;24(1):132-41.

TABLE 6 Target Site SEQ Sequence (20bp- ID sgRNA SPACERnnnnACA) NOHuman AAVS1- ATATAAGGTGGTCCCAGCTCggggACA 32 CJ1 Human AAVS1-GTAGAGGCGGCCACGACCTGgtgaACA 33 NRG1 Human AAVS1-TCACAAAGGGAGTTTTCCACacggACA 34 NRG3 Human AAVS1-TAGGCAGATTCCTTATCTGGtgacACA 35 NRG5

Three days after an sgRNA expression vector was delivered into293-cells, genomic DNA was isolated and analyzed for mutation efficiencyby deep sequencing. The results are depicted in FIG. 5C. As can be seen,high efficiency was detected when the spacers ranged in length from 21to 23 bp. In addition, even when 2-3 additional G residues were added tothe 5′ end of sgRNA of a 20 bp-long spacer, an improvement in genomeediting was observed.

TABLE 7 NGS- NGS- primer- primer- Target F* Sequences R** SequencessgRNA Human AS-AV- ACACTCTTTC AS-AV- GTGACTGGAG CJ1 AAVS1 F1 CCTACACGACR1 TTCAGACGTG GCTCTTCCGA TGCTCTTCCG TCTAGGAGGA ATCTTGTCAT GGCCTAAGGAGGCATCTTCC TGG AGGG (SEQ ID (SEQ ID  NO: 36) NO: 39) AS-AV- ACACTCTTTCAS-AV- GTGACTGGAG NRG1, F2 CCTACACGAC R2 TTCAGACGTG NRG3 GCTCTTCCGATGCTCTTCCG TCTGCTCTGG ATCTTCCGTG GCGGAGGAAT CGTCAGTTTT ATG ACCT (SEQ ID(SEQ ID NO: 37) NO: 40) AS-AV- ACACTCTTTC AS-AV- GTGACTGGAG NRG5 F4CCTACACGAC R4 TTCAGACGTG GCTCTTCCGA TGCTCTTCCG TCTATCCTCT ATCTCCGGTTCTGGCTCCAT AATGTGGCTC CGT TGGT (SEQ ID (SEQ ID NO: 38) NO: 41) Here, F*indicates a forward primer and R** indicates a reverse primer.

Example 4: C. jejuni Cas9 PAM Sequence Analysis

In the present disclosure, the PAM sequence of C. jejuni Cas9 wasinferred to comprise “NNNNACA”, based on data in the existingliterature, and experiments were conducted. Of 34 C. jejuni CRISPR/Cas9systems constructed for five genome sites, only three exhibitedactivity. Particularly, additional analysis of the sequences coveringthe sites in the three active systems showed that the nucleotide “C” wasidentified immediately after the PAM sequence (NNNNACA) in all threesites (TABLE 8).

TABLE 8 Activ- ity SEQ (T7E1 ID sgRNA assay) Sequence NO Human AAVS1hAAVS1- O ATATAAGGTGGTCCCAGCTCGGG 42 CJ1 GACA C hAAVS1- XTGGCCCCACTGTGGGGTGGAGGGGA 43 CJ2 CAG hAAVS1- X CACCCCACAGTGGGGCCACTAGGGA44 CJ3 CAG CCR5 CCR5- X CTAGCAGCAAACCTTCCCTTCACTAC 45 CJ1 AA CCR5- XCTCCATGAATGCAAACTGTTTTATAC 46 CJ2 AT CCR5- X TGCATTCATGGAGGGCAACTAAATA47 CJ3 CAT CCR5- X ATCAAGTGTCAAGTCCAATCTATGA 48 CJ4 CAT CCR5- XCCAATCTATGACATCAATTATTATAC 49 CJ5 AT CCR5- X GCAAAAGGCTGAAGAGCATGACTG 50CJ6 ACAT CCR5- X GCAGCATAGTGAGCCCAGAAGGGG 51 CJ7 ACAG CCR5- XGCCGCCCAGTGGGACTTTGGAAATA 52 CJ8 CAA Mouse Rosa26 ROSA26- XTCCACTGCAGCTCCCTTACTGATAAC 53 CJ1 AA ROSA26- O ATTCCCCTGCAGGACAACGCCCAC54 CJ2* ACA C ROSA26- X ACACCTGTTCAATTCCCCTGCAGGA 55 CJ3 CAA ROSA26- XTTGAACAGGTGTAAAATTGGAGGGA 56 CJ4 CAA ROSA26- XTTGCCCCTATTAAAAAACTTCCCGAC 57 CJ5 AA ROSA26- X AGATCCTTACTACAGTATGAAATTA58 CJ6 CAG ROSA26- X AGCCTTATCAAAAGGTATTTTAGAA 59 CJ7 CAC TP53 TP53- XCGGGGCCCACTCACCGTGCACATAA 60 CJ1 CAG TP53- X GCCGTGTCCGCGCCATGGCCATCTA61 CJ2 CAA TP53- X TGGCCATCTACAAGAAGTCACAGCA 62 CJ3 CAT TP53- XCCGAGTGTCAGGAGCTCCTGCAGCA 63 CJ4 CAG TP53- X CTCCCCGGGGCCCACTCACCGTGCA64 CJ5 CAT TP53- X CCTGTGCAGTTGTGGGTCAGCGCCA 65 CJ6 CAC TP53- XGGTGTGGCGCTGACCCACAACTGCA 66 CJ7 CAG TP53- O TTCTTGTAGATGGCCATGGCGCG 67CJ8 GACA C TP53- X CGCCATGGCCATCTACAAGAAGTCA 68 CJ9 CAG PTEN mPTEN- XACATCATCAATATTGTTCCTGTATAC 69 CJ1 AC mPTEN- X TGAATCCAAAAACCTTAAAACAAAA70 CJ2 CAA mPTEN- X TGCTTTGAATCCAAAAACCTTAAAA 71 CJ3 CAA mPTEN- XAGCATAAAAACCATTACAAGATATA 72 CJ4 CAA mPTEN- X GTAGATGTGCTGAGAGACATTATGA73 CJ5 CAC mPTEN- X GGCGGTGTCATAATGTCTCTCAGCA 74 CJ6 CAT mPTEN- XATTTAACTGCAGAGGTATGTATAAA 75 CJ7 CAT

Based on this result, the PAM sequence was inferred to contain“NNNNACAC”. While the nucleotide at each site of “ACAC” were substitutedby A/T/G/C, the activity of C. jejuni Cas9 was analyzed to identify thePAM sequence of C. jejuni RGEN. For this, a surrogate reporter assay wasutilized. As a result, C. jejuni was identified to comprise the PAMsequence of “NNNNRYAC (SEQ ID NO: 1)” (FIG. 6, wherein R is a purineresidue (A or G) and Y is a pyrimidine residue (CIF)). This experimentwas carried out using the surrogate reporter assay described in NatMethods. 2011 Oct. 9; 8(11):941-3.

Example 5: Assay of Specificity and PAM Sequence of C. jejuniCRISPR/Cas9

The cleavage sites of C. jejuni CRISPR/CAS9 in the AAVS1-CJ1 loci wereanalyzed at the genomic level using Digenome-seq, a CRISPR/Cas9off-target assay developed and submitted for patent protection by thepresent inventors. The experiment was carried out using a methoddescribed in Nat Methods. 2015 March; 12(3):237-43.

Through Digenome-Seq, 41 loci at which AAVS1-CJ1 CRISPR/Cas9 seemed tobe cleaved were determined (Genomic locations in TABLE 9). Consensussequences were obtained from alignments of cleavage site sequences ofthe 41 loci, and PAM consistent with that identified in Example 4 wasverified.

Further, to examine whether an off-target mutation is actuallyintroduced into the potential off-targets acquired by Digenome-Seq,genomic DNA from 293-cells to which AAVS1-CJ1 CRISPR engineered nucleasewas delivered was subjected to deep sequencing for 40 potentialoff-target sites. As shown in TABLE 9, no significant mutations wereobserved.

TABLE 9 Indel Frequency Genomic Location Mock C. Jejuni CRISPR On-targetchr19 55627221 0.02 5.123 CJ_AAVS1_1 chr1 24521012 0.019 0.034CJ_AAVS1_2 chr1 29848565 0.157 0.136 CJ_AAVS1_3 chr1 30381084 0.0410.035 CJ_AAVS1_4 chr1 37283269 0.016 0.016 CJ_AAVS1_5 chr2 553333690.079 0.091 CJ_AAVS1_6 chr4 153532801 0.003 0.003 CJ_AAVS1_7 chr4153926891 0 0 CJ_AAVS1_8 chr4 183304101 0.033 0.046 CJ_AAVS1_9 chr651746466 0.41 0.43 CJ_AAVS1_10 chr7 11346020 0.02 0.038 CJ_AAVS1_11 chr7128481430 0.024 0.036 CJ_AAVS1_12 chr7 142878579 0.024 0.028 CJ_AAVS1_13chr8 25979587 0.138 0.155 CJ_AAVS1_14 chr8 80240626 0.043 0.049CJ_AAVS1_15 chr8 141347249 0.028 0.024 CJ_AAVS1_16 chr8 141688584 0.0880.092 CJ_AAVS1_17 chr8 143120119 0.016 0.013 CJ_AAVS1_18 chr9 839607680.032 0.037 CJ_AAVS1_19 chr9 102650644 0.029 0.034 CJ_AAVS1_20 chr9129141695 0.014 0.009 CJ_AAVS1_21 chr10 103862556 0.053 0.073CJ_AAVS1_22 chr12 9085293 0.21 0.277 CJ_AAVS1_23 chr14 70581187 0.0130.025 CJ_AAVS1_24 chr14 95327446 0.046 0.041 CJ_AAVS1_25 chr14 1023311760.015 0.028 CJ_AAVS1_26 chr14 104753692 0.035 0.041 CJ_AAVS1_27 chr1567686972 0.061 0.096 CJ_AAVS1_28 chr16 85565862 0.028 0.028 CJ_AAVS1_29chr17 17270109 0.003 0 CJ_AAVS1_30 chr17 79782954 0.03 0.043 CJ_AAVS1_31chr18 42305670 0.035 0.043 CJ_AAVS1_32 chr19 12826405 0.024 0.039CJ_AAVS1_33 chr19 32268337 0.043 0.042 CJ_AAVS1_35 chr20 40758976 0 0CJ_AAVS1_36 chr21 41295936 0.011 0.007 CJ_AAVS1_37 chr22 20990738 0.0040.004 CJ_AAVS1_38 chr22 46402289 0.006 0.011 CJ_AAVS1_39 chr22 464266070.003 0 CJ_AAVS1_40 chrX 27472673 0.279 0.318

Further, consensus sequences were obtained from the entire alignment ofthe sequences of 41 loci that showed cleavages in vitro. Consistent withprevious results, PAM was actually observed as NNNNRYAC (SEQ ID NO: 1).

Example 6: Degeneracy at First Two Nucleotides of PAM

The PAM sequence of C. jejuni was found to be NNNNRYAC″ as well as“NNNNACAC” in Example 5, showing degeneracy at the first two positions.In order to corroborate the degeneracy, sgRNAs were constructedrespectively for the 7 PAM target sequences of C. jejuni of human AAVS1loci, which carried G or T residues at the first two positions (TABLE10), and analyzed for mutation efficiency in HEK293 cells.

TABLE 10 SEQ Di- ID sgRNA rection PAM Target Sequence NO hAAVS1- +NNNNRYAC gCCACGACCTGGTGA 76 RYN1 ACACCTAGGACGCAC hAAVS1- +gGCCTTATCTCACAG 77 RYN2 GTAAAACTGACGCAC hAAVS1- + cTCTTGGGAAGTGTA 78RYN3 AGGAAGCTGCAGCAC hAAVS1- + aGCTGCAGCACCAGG 79 RYN4 ATCAGTGAAACGCAChAAVS1- + cTGTGGGGTGGAGGG 80 RYN5 GACAGATAAAAGTAC hAAVS1- −gCCGGTTAATGTGGC 81 RYN6 TCTGGTTCTGGGTAC hAAVS1- + gCCATGACAGGGGGC 82RYN7 TGGAAGAGCTAGCAC

Of the seven constructed sgRNAs, six were found to induce mutations,demonstrating degeneracy at the first two positions of the PAM sequences(FIG. 8). Accordingly, this degeneracy increases the frequency of thePAM sequences, allowing improved accuracy of the genome editing of C.jejuni.

Example 7: Genome Editing through C. jejuni CRISPR/CAS9 Delivery UsingAAV

Representative among promising fields in which genome editing findsapplication are genome editing technologies for gene and cell therapy.The practical application of genome editing to therapy needs aclinically applicable vector for effectively delivering an engineerednuclease and a donor DNA to target cells in vitro or in vivo. The twomost widely used engineered nuclease platforms, TALENs and RGEN, arelimited to application to established gene therapy vectors due to theirlarge sizes. In contrast, the C. jejuni RGEN of the present disclosureconsists of the smallest CAS9 protein and sgRNA among the RGENsdeveloped so far. Thanks to its small size, the C. jejuni RGEN can allowlarge-sized gene therapy vectors to be used in genome manipulation. Forexample, AAV (adeno-associated virus), serving as one of the mostimportant vectors for gene therapy, imposes strict limitations on thesize of the DNA to be carried thereby, and thus is difficult to apply tothe RGEN derived from S. pyogenes, S. thermophilus, or N. meningitidis,or to the currently used engineered nuclease platform TALEN. Incontrast, the C. jejuni RGEN can be applied to an AAV vector.

In the present disclosure, examination was made of the operation of theC. jejuni Cas9 through practical AAV delivery. To this end, an AAVvector carrying both a C. jejuni Cas9 expression cassette and an sgRNAexpression cassette was constructed (FIG. 9) and used to produce AAV.After infection with the AAV, mouse C2C12 cells were quantitativelyanalyzed for mutations (FIG. 10). As can be seen, mutations were inducedin target sites in an AAV dose- and time-dependent manner. Particularly,4 weeks after infection at high MOI (100), mutations were induced at anefficiency of 90% or higher in the target sites.

Consequently, the C. jejuni RGEN was proven to effectively performgenome editing in cultured cells. In addition, the PAM sequence of theC. jejuni CRISPR/Cas9 system was actually determined, as the sequenceproposed in previous studies was found not to be perfect. Further, theC. jejuni RGEN can be loaded into a single virus thanks to the smallsizes of its elements, and thus can be used for effective genomeediting.

Enrichment of Target DNA Using dCAS9: gRNA Complex

Moreover, a target DNA was isolated and enriched using the RGEN(dCas9:gRNA complex) composed of a Streptococcus pyogenes-derived,deactivated Cas9 protein and a guide RNA.

In this regard, the dCas9 protein was tagged with six consecutive Hisresidues, so that it could be purified using Ni-NTA magnetic beads forselectively binding to the His tag. In addition, the dCas protein-sgRNAcomplex can be used for the selective purification of a target DNAbecause the complex can bind specifically to a certain DNA sequence, butlacks nuclease activity.

The RGEN (dCas9: gRNA complex) composed of a guide RNA and a deactivatedCas nuclease was tested for ability to isolate a target DNA. For this,first, the plasmid pUC19 was digested with restriction enzymes (SpeI,XmaI, XhoI) to yield plasmid DNA fragments 4134 bp, 2570 bp, and 1263 bpin length, respectively.

For each of the plasmid DNA fragments digested with the restrictionenzymes, two different sgRNAs were synthesized (4134bp_sg #1, 4134bp_sg#2, 2570bp_sg #1, 2570bp_sg #2, 1263bp_sg #1, and 1263bp_sg #2). Apurification procedure was carried out using the sgRNAs corresponding totarget DNAs, singularly or in combination (4134bp_sg #1+2, 2570bp_sg#1+2, and 1263bp_sg #1+2). The nucleotide sequences of the sgRNAs arelisted in TABLE 11 below.

TABLE 11 PAM sgRNA Target sequence Sequence 4134bp_ GAGAACCAGACCACCCAGAAGGG sg#1 (SEQ ID NO: 83) 4134bp_ GGCAGCCCCGCCATCAAGAA GGG sg#2(SEQ ID NO: 84) 2570bp_ GTAAGATGCTTTTCTGTGAC TGG sg#1 (SEQ ID NO: 85)2570bp_ GATCCTTTGATCTTTTCTAC GGG sg#2 (SEQ ID NO: 86) 1270bp_GCCTCCAAAAAAGAAGAGAA AGG sg#1 (SEQ ID NO: 87) 1270bp_TGACATCAATTATTATACAT CGG sg#2 (SEQ ID NO: 88) *nucleotide sequences ofthe sgRNAs are identical to those of the target DNA, except for U inplace of T.

A total of 200 μl of a mixture solution containing DNA: dCas9 protein:sgRNA at a molar ratio of 1:20:100 was incubated at 37° C. for 1.5 hrs.Then, the solution was mixed with 50 μl of Ni-NTA magnetic beads bindingspecifically to His-tag, and washed twice with 200 μl of a wash buffer,followed by purifying a dCas9-sgRNA-target DNA complex with 200 μl of aneluting buffer (Bioneer, K-7200).

Thereafter, the eluate was incubated at 37° C. for 2 hrs with 0.2 mg/mlRNase A (Amresco, E866) and then at 55° C. for 45 min with 0.2 mg/mlProteinase K to remove both the sgRNA and the dCas9 protein. The targetDNA alone was precipitated in ethanol.

As a result, using the sgRNAs, whether singularly or in combinations oftwo thereof, for individual target DNAs, desired target DNAs could beisolated from the three DNA fragments digested by size. In addition,when multiple target DNAs were purified with combinations of sgRNA, suchas a total of 4 different sgRNAs for two different target DNAs (2 sgRNAsfor each target DNA), the target DNAs were associated with correspondingsgRNAs and thus purified. The results indicate that each target DNAcould be isolated at a purity of 95% or higher.

Also, the purification technique is true of the Cas protein recognizingthe PAM (proto-spacer-adjacent motif) sequence NNNNRYAC (SEQ ID NO: 1)of the present disclosure.

Based on the above description, it should be understood by those skilledin the art that various alternatives to the embodiments of the inventiondescribed herein may be employed in practicing the invention withoutdeparting from the technical idea or essential features of the inventionas defined in the following claims. In this regard, the above-describedexamples are for illustrative purposes only, and the invention is notintended to be limited by these examples. The scope of the presentinvention should be understood to comprise all of the modifications ormodified forms derived from the meaning and scope of the followingclaims or equivalent concepts.

1-67 (canceled)
 68. A method of altering a genome via targeting a DNAsequence comprising a protospacer adjacent motif (PAM) sequence ofNNNNRYAC (SEQ ID NO:1), the method comprising: introducing a Cas proteinthat recognizes the PAM sequence of SEQ ID NO:1, or a nucleic acidencoding the Cas protein, and; an isolated guide RNA, comprising asequence capable of forming a duplex with a complementary strand of atarget DNA sequence adjacent to a proto-spacer-adjacent motif (PAM)sequence of NNNNRYAC (SEQ ID NO: 1), or a nucleic acid encoding theisolated guide RNA into a cell comprising the genome, thereby alteringthe genome.
 69. The method of claim 68, wherein the Cas proteinoriginates from a microorganism belonging to Campylobacter.
 70. Themethod of claim 69, wherein the microorganism is Campylobacter jejuni.71. The method of claim 68, wherein the Cas protein is a Cas9 protein.72. The method of claim 68, introducing a guide RNA is carried outsimultaneously or sequentially with the Cas protein that recognizes thePAM sequence of SEQ ID NO:
 1. 73. The method of claim 68, wherein theguide RNA is a dual RNA comprising CRISPR RNA (crRNA) andtrans-activating crRNA (tracrRNA).
 74. The method of claim 68, whereinthe guide RNA is a single guide RNA (sgRNA).
 75. The method of claim 74,wherein the sgRNA comprises a first region containing a sequence capableof forming a duplex with a complementary strand of the target DNAsequence, and a second region containing a sequence that interacts withthe Cas protein.
 76. The method of claim 74, wherein the sgRNA comprisesa portion of a crRNA which contains a sequence capable of forming aduplex with a complementary strand of the target DNA sequence, and aportion of a tracrRNA which contains a sequence interacting with the Casprotein.
 77. The method of claim 68, wherein the Cas protein having thenickase activity has a catalytic aspartic acid (D) at position 8 andhistidine (H) at position 559 substituted with another amino acid. 78.The method of claim 77, wherein the substituted amino acid is alanine.79. The method of claim 68, wherein the Cas protein binds to the targetDNA sequence comprising the PAM sequence of SEQ ID NO: 1, withoutcleaving the target DNA.
 80. A composition for altering a genome with atarget DNA sequence, comprising: an isolated guide RNA or a DNA encodingthe isolated guide RNA, and a Cas9 protein that recognizes a PAMsequence of NNNNRYAC (SEQ ID NO:1) or a nucleic acid encoding the Cas9protein, wherein the target DNA sequence is adjacent to the PAM sequenceof NNNNRYAC (SEQ ID NO:1).
 81. The composition of claim 80, wherein theCas protein is a deactivated Cas protein (dCas) or a nucleic acid thatencodes the dCas.
 82. The composition of claim 80, wherein the isolatedguide RNA is a dual RNA comprising a crRNA (CRISPR RNA) and a tracrRNA(trans-activating crRNA).
 83. The composition of claim 80, wherein theisolated guide RNA is a single-stranded guide RNA (sgRNA).
 84. Thecomposition of claim 83, wherein the sgRNA comprises a first regioncontaining a sequence capable of forming a duplex with a complementarystrand of the target DNA sequence, and a second region containing asequence interacting with the Cas protein.
 85. The composition of claim83, wherein the length of the sequence capable of forming a duplex witha complementary strand of a target DNA sequence is 17 to 23 bp.
 86. Thecomposition of claim 83 wherein the guide RNA further comprises one tothree additional nucleotides prior to a 5′ end of the sequence capableof forming a duplex with a complementary strand of the target DNA. 87.The composition of claim 86, wherein the additional nucleotides compriseguanine (G).